Method and apparatus for modeling mass spectrometer lineshapes

ABSTRACT

Methods and apparatuses are disclosed that model the lineshapes of mass spectrometry data. Ions can be modeled with an initial distribution that models molecules as having multiple positions and/or energies prior to traveling in the mass spectrometer. These initial distributions can be pushed forward by time of flight functions. Fitting can be performed between the modeled lineshapes and empirical data. Filtering can greatly reduce dimensions of the empirical data, remove noise, compress the data, recover lost and/or damaged data.

BACKGROUND OF THE INVENTION

[0001] Mass spectrometry can be applied to the search for significantsignatures that characterize and diagnose diseases. These signatures canbe useful for the clinical management of disease and/or the drugdevelopment process for novel therapeutics. Some areas of clinicalmanagement include detection, diagnosis and prognosis. More accuratediagnostics may be capable of detecting diseases at earlier stages.

[0002] A mass spectrometer can histogram a number of particles by mass.Time-of-flight mass spectrometers, which can include an ionizationsource, a mass analyzer, and a detector, can histogram ion gases bymass-to-charge ratio. Time-of-flight instruments typically put the gasthrough a uniform electric field for a fixed distance. Regardless ofmass or charge all molecules of the gas pick up the same kinetic energy.The gas floats through an electric-field-free region of a fixed length.Since lighter masses have higher velocities than heavier masses giventhe same kinetic energy, a good separation of the time of arrival of thedifferent masses will be observed. A histogram can be prepared for thetime-of-flight of particles in the field free region, determined bymass-to-charge ratio.

[0003] Mass spectrometry with and without separations of serum samplesproduces large datasets. Analysis of these data sets can lead tobiostate profiles, which are informative and accurate descriptions ofbiological state, and can be useful for clinical decisionmaking. Largebiological datasets usually contain noise as well as many irrelevantdata dimensions that may lead to the discovery of poor patterns.

[0004] When analyzing a complex mixture, such as serum, that probablycontains many thousands of proteins, the resulting spectral peaks showperhaps a mere hundred proteins. Also, with a large number of molecularspecies and a mass spectrometer with a finite resolution, the signalpeaks from different molecular species can overlap. Overlapping signalpeaks make different molecular species harder to differentiate, or evenindistinguishable. Typical mass spectrometers can measure approximately5% of the ionized protein molecules in a sample.

[0005] Performing analysis on raw data can be problematic, leading tounprincipled analysis of both data points and peaks. Raw data analysiscan treat each data point as an independent entity. However, theintensity at a data point may be due to overlapping peaks from severalmolecular species. Adjacent data points can have correlated intensities,rather than independent intensities. Ad hoc peak picking involvesidentifying peaks in a spectrum of raw data and collapsing each peakinto a single data point.

[0006] Mass spectra of simple mixtures, such as some purified proteins,can be resolved relatively easily, and peak heights in such spectra cancontain sufficient information to analyze the abundance of speciesdetected by the mass spectrometer (which is proportional to theconcentration of the species in the gas-phase ion mixture). However, themass spectra of sera or other complex mixtures can be more problematic.A complex mixture can contain many species within a small mass-to-chargewindow. The intensity value at any given data point may havecontributions from a number of overlapping peaks from different species.Overlapping peaks can cause difficulties with accurate massmeasurements, and can hide differences in mass spectra from one sampleto the next. Accurate modeling of the lineshapes, or shapes of thepeaks, can enhance the reliability and accurate analysis of mass spectraof complex biological mixtures. Lineshape models, or models of the peakscan also be called modeled mass-to-charge distributions.

[0007] Signal processing can aid the discovery of significant patternsfrom the large volume of datasets produced by separations-massspectrometry. Mass spectral signal processing can address the resolutionproblem inherent in mass spectra of complex mixtures. Pattern discoverycan be enhanced from signal processing techniques that remove noise,remove irrelevant information and/or reduce variance. In oneapplication, these methods can discover preliminary biostate profilesfrom proteomics or other studies.

[0008] Therefore, it is desirable to reduce the noise and/ordimensionality of datasets, improve the sensitivity of massspectrometry, and/or process the raw data generated by mass spectrometryto improve tasks such as pattern recognition.

BRIEF SUMMARY OF THE INVENTION

[0009] In some embodiments, molecules can be represented with a modeledmass-to-charge distribution detected by a mass spectrometer. The modeledmass-to-charge distribution can be based on a modeled initialdistribution representing the molecules prior to traveling in the massspectrometer. The modeled initial distribution can represent themolecules as having multiple positions and/or multiple energies and/orother initial parameters including ionization, position focusing,extraction source shape, fringe effects of electric fields, and/orelectronic hardware artifacts. The modeled mass-to-charge distributionof the molecules and an empirical mass-to-charge distribution of themolecules can be compared.

[0010] In some embodiments, molecules can be represented by an analyticexpression of a modeled mass-to-charge distribution detected by a massspectrometer. The modeled mass-to-charge distribution can be based on amodeled initial distribution representing molecules prior to travelingin the mass spectrometer. The modeled initial distribution can representthe molecules as having multiple positions and/or multiple energiesand/or other initial parameters including ionization, position focusing,extraction source shape, fringe effects of electric fields, and/orelectronic hardware artifacts.

BRIEF DESCRIPTION OF THE FIGURES

[0011]FIG. 1 is a flowchart illustrating one embodiment of performingsignal processing on a mass spectrum.

[0012]FIG. 2 is a flowchart illustrating aspects of some embodiments ofperforming signal processing on a mass spectrum.

[0013]FIG. 3 is a simple schematic of a time-of-flight massspectrometer.

[0014]FIG. 4 is a simple schematic of a time-of-flight mass spectrometerwith a reflectron.

[0015]FIG. 5 illustrates a probability density function of a pushedforward Gaussian, showing a skew to the right.

[0016]FIG. 6 shows a change of coordinates from (x, z) to (v, θ)

[0017]FIG. 7 shows a mass spectrum.

[0018]FIG. 8 shows an expanded view of FIG. 7.

DETAILED DESCRIPTION OF THE INVENTION

[0019] The number of samples can be quite small relative to the numberof data dimensions. For example, disease studies can include, in onecase, on the order of 10² patients and 10⁹ data dimensions per sample.

[0020] To lessen the computational burden of pattern recognitionalgorithms and improve estimation of the significance of a given patternbetter, dimensionality reduction can be performed on the massspectrometry data. Signal processing can ensure that processed datacontains as little noise and irrelevant information as possible. Thisincreases the likelihood that the biostate profiles discovered by thepattern recognition algorithms are statistically significant and are notobtained purely by chance.

[0021] Dimensionality reduction techniques can reduce the scope of theproblem. An important tool of dimensionality reduction is the analysisof lineshapes, which are the shapes of peaks in a mass spectrum.

[0022] Lineshapes, instead of individual data points, can be interpretedin a physically meaningful way. The physics of the mass spectrometer canbe used to derive mathematical models of mass spectrometry lineshapes.Ions traveling through mass spectrometers have well-defined statisticalbehavior, which can be modeled with probability distributions thatdescribe lineshapes. The modeled lineshapes can represent thedistribution of the time-of-flight for a given mass/charge (m/z), givenfactors such as the initial conditions of the ions and instrumentconfigurations.

[0023] For specific mass spectrometer configurations, equations arederived for the flight time of an ion given its initial velocity andposition. Next, a probability distribution is assumed of initialpositions and/or velocities and/or other initial parameters that affectthe time-of-flight based on rigorous statistical mechanicalapproximation techniques and/or distributions such as gaussians.Formulae are then calculated for the time-of-flight probabilitydistributions that result from the probability-theoretical technique of“pushing forward” the initial position and/or velocity distributions bythe time-of-flight equations. Each formula obtained can describe thelineshape for a mass-to-charge species.

[0024] A complex spectrum can be modeled as a mixture of suchlineshapes. Using the modeled lineshapes, real spectrometric raw data ofan observed mass spectrum can be deconvolved into a more informativedescription. The modeled lineshapes can be fitted to spectra, and/orresidual error minimization techniques can be used, such as optimizationalgorithms with L2 and/or L1 penalties. Coefficients can be obtainedthat describe the components of the deconvolved spectrum.

[0025] Thus, data dimensions that describe a given peak can be collapsedinto a simpler record that gives, for example, the center of the peakand the total intensity of the peak. In some cases, a broad peak in aspectrum can be replaced with much less data, which can be several m/zdata points or a single m/z data point that represents the observedcomponent's abundance in the spectrometer, which in turn is correlatedwith the abundance of the observed component in the original sample.

[0026] Filtering techniques (e.g., hard thresholding, soft thresholdingand/or nonlinear thresholding) can be performed to de-noise and/orcompress data. The processed data, with noise removed and/or havingreduced dimensionality, can be one or more orders of magnitude smallerthan the original raw dataset. Thus, the original raw dataset can bedecomposed into chemically meaningful elements, despite the artifactsand broadening introduced by the mass spectrometer. Even in instanceswhere peaks overlap such that they are visually indiscernible, thismethod can be applied to decompose the spectrum. The processed data maybe roughly physically interpretable and can be much better suited forpattern recognition, due to the significantly less noise, fewer datadimensions, and/or more meaningful representation of charged states,isotopes of particular proteins, and/or chemical elements, that relateto the abundance of different molecular species.

[0027] When applied to processed data, such pattern recognition methodsidentify proteins which may be indicative of disease, and/or aid in thediagnosis of disease in people and quantify their significance. Findingthe proteins and/or making a disease diagnosis can be based at leastpartly on the modeled mass-to-charge distribution.

[0028]FIG. 1 is a flowchart illustrating one embodiment of performingsignal processing on a mass spectrum. In 110, a modeled mass-to-chargedistribution represents molecules that have traveled through a massspectrometer. The modeled mass-to-charge distribution is based on atleast a modeled initial distribution of any parameter affectingtime-of-flight representing the molecules prior to traveling in the massspectrometer. In 120, the modeled mass-to-charge distribution iscompared with an empirical mass-to-charge distribution. Variousembodiments can add, delete, combine, rearrange, and/or modify parts ofthis flowchart.

[0029]FIG. 2 is a flowchart illustrating aspects of some embodiments ofperforming signal processing on a mass spectrum. In 210, a modeledinitial distribution of one or more parameters affecting time-of-flightrepresents molecules prior to traveling in the mass spectrometer. In220, the modeled initial distribution is pushed forward by time offlight functions. The modeled distribution is thereby based at leastpartly on the modeled initial distribution. In 230, a mass spectrometerdetects an empirical distribution of molecules. This empiricaldistribution and the modeled distribution can be compared. In 240, a fitis performed between the empirical and modeled distributions. In 250,the fit is filtered. Various embodiments can add, delete, combine,rearrange, and/or modify parts of this flowchart.

[0030] Simple Mass Spectrometer Analyzer Configuration

[0031]FIG. 3 illustrates a simple schematic of a time-of-flight massspectrometer. In a simple case, the mass analyzer has two chambers: theextraction region 310 and the drift region 320 (also called thefield-free region), at the end of which is the detector 330. The flightaxis 340 extends from the extraction chamber to the detector. Oneexample of the effect of location in the extraction region on thetime-of-flight of an ion is illustrated. Ion 360 is closer to the backof the extraction chamber than ion 370. Ion 360 is accelerated for alonger time in the extraction region 310 than ion 370. Ion 360 exits theextraction region 310 with a higher velocity than ion 370. Thus ion 360reaches the detector 330 before ion 370.

[0032]FIG. 4 illustrates a simple schematic of a time-of-flight massspectrometer with a reflectron. In addition to the extraction region410, the drift region 420, and the detector 430, a reflectron 440 helpsto lengthen the drift region 420 and focus the ions.

[0033] In some embodiments, the full gas content is completely localizedin the extraction chamber with negligible kinetic energy in thedirection of the flight axis. Other embodiments permit the gas tohavesome kinetic energy in the direction of the flight axis, and/or havesome kinetic energy away from the direction of the flight axis. Inanother embodiment, the gas ions have an initial spatial distributionwithin the extraction source. In yet another embodiment, the gas ionshave an initial spatial distribution within the extraction source andhave some kinetic energy in the direction of the flight axis, and/orhave some kinetic energy away from the direction of the flight axis.

[0034] In an ideal case, an extraction chamber has a potentially pulseduniform electric field E₀ in the direction of the flight axis, and haslength s₀. An ion of mass m and charge q that starts at the back of theextraction chamber will pick up kinetic energy E₀ s₀ q while travelingthrough the electric field. Suppose the field-free region has length D.If the ion has constant energy while in the field-free region, then:$\begin{matrix}{{\frac{1}{2}{mv}^{2}} = {E_{0}s_{0}q}} & (1)\end{matrix}$

[0035] Other embodiments model an extraction chamber with a uniformelectric field in a direction other than the flight axis, and/or anelectric field that is at least partly nonuniform and/or at least partlytime dependent.

[0036] If t_(D) is the time-of-flight in the field-free region, andν=D/t_(D) then: $\begin{matrix}{t_{D} = {D\sqrt{\frac{m}{2\quad E_{0}s_{0}q}}}} & (2)\end{matrix}$

[0037] If not only the time-of-flight in the drift-free region is ofinterest, but the time spent in the extraction region as well, thevelocity can be a function of distance traveled (from the energygained). If u is the distance traveled, then${v(u)} = {\sqrt{\frac{2\quad E_{0}{uq}}{m}}.}$

[0038] Both sides of dt=du/ν(u) are integrated:$t_{ext} = {{\int_{0}^{s_{0}}{\sqrt{\frac{m}{2\quad E_{0}{uq}}}{u}}} = {{\sqrt{\frac{m}{2\quad E_{0}s_{0}q}} \cdot 2}\quad {s_{0}.}}}$

[0039] So the total time-of-flight is t_(tot)=t_(ext)+t_(D):$\begin{matrix}{t_{tot} = {\left( {D + {2s_{0}}} \right)\sqrt{\frac{m}{2\quad E_{0}s_{0}q}}}} & (3)\end{matrix}$

[0040] Analogous equations can be derived to represent the ions as theymove through other regions of a mass spectrometer.

[0041] With real world conditions, errors in the mass spectrum histogramcan be seen, and the time-of-flight of a given species of mass-to-chargecan have a distribution with large variance. This can be measured bywidths at half-maximum height of peaks that are observed, to generateresolution statistics. The resolution of a given mass-to-charge is m/δm(where m represents mass-to-charge m/q of equation (3) and where “δm”refers to the width at the half-maximum height of the peak).

[0042] Some factors that affect the time-of-flight distributions of agiven mass-to-charge species are the initial spatial distribution withinthe extraction chamber, and the initial kinetic energy (alternatively,initial velocity) distribution in the flight-axis direction, and/orother initial parameters including ionization, position focusing,extraction source shape, fringe effects of electric fields, and/orelectronic hardware artifacts. Other embodiments can represent theinitial kinetic energy (alternatively initial velocity) distribution ina direction other than the flight-axis direction.

[0043] Choosing Initial Distributions of Species

[0044] The initial distributions of parameters of an ion species thataffect the time-of-flight pushed forward by the time of flight functionscan be called modeled initial distributions.

[0045] Some embodiments use distributions such as gaussian distributionsof initial positions and/or energies (alternatively velocities).

[0046] Other embodiments can use various parametric distributions ofinitial positions and/or energies. The parameters can result from datafitting and/or by scientific heuristics. Further embodiments rely onstatistical mechanical models of ion gases or statistical mechanicalmodels of parameters that affect the time-of-flight. In many cases, thequantity of material in the extraction region is in the pico-molar range(10⁻¹² moles is on the order of 10¹¹ particles) and hence statistics arereliable. An issue is the timescale for the system to reach equilibrium.In some embodiments, equilibrium statistical mechanics can apply if thesystem converges to equilibrium faster than, e.g. the microsecond range.

[0047] Model of Species Distributed in Position

[0048] Some embodiments have a parametric model of the initial positiondistribution and with a fixed initial energy. The time-of-flightdistribution to be observed can be modeled. Let S be a normal randomvariable with mean s₀ and variance σ_(o) ²<<s₀. In the followingcalculations, the distribution of the time-of-flight in the field-freeregion (t_(D)) is modeled rather than the total time-of-flight(t_(tot)). Other embodiments can model the total time-of-flight, or inthe field regions such as constant field regions.

[0049] From (2) the time-of-flight can be a random variable t_(D)(S) andwhat will be observed in the mass spectrum is the probability densityfunction of t_(D)(S′). The peak shape is the density of the push-forwardof N(s₀, σ_(o) ²) measured under the map t_(D): R→R. From probabilitytheory, if U=h(X) and h(x) is either increasing or decreasing, then theprobability density functions p_(U)(u) and p_(U)(u)=p_(S)(s) are relatedby $\begin{matrix}{{p_{U}(u)} = {{p_{S}\left( {h^{- 1}(u)} \right)}{\frac{\left( {h^{- 1}(u)} \right)}{u}}}} & (4)\end{matrix}$

[0050] In some embodiments, this can be a strictly decreasing function;other embodiments have an increasing function. To simplify notation, lett_(D)=ψ and Z=ψ(S). A constant is defined:$K = {D{\sqrt{\frac{m}{2\quad E_{o}q}}.}}$

[0051] From above, the probability density functions P_(z)(z) andp_(s)(s) are related by${p_{z}(z)} = {{p_{S}\left( {\psi^{- 1}(z)} \right)}{\frac{\left( {\psi^{- 1}(z)} \right)}{z}}}$

[0052] Solving for ψ⁻¹(z) and$\frac{\left( {\psi^{- 1}(z)} \right)}{z}$

[0053] gives${\psi^{- 1}(z)} = {{\frac{K^{2}}{z^{2}}\quad {and}\quad \frac{\left( {\psi^{- 1}(z)} \right)}{z}} = {\frac{{- 2}\quad K^{2}}{z^{3}}.}}$

[0054] In embodiments where the probability density function p_(s)(s) isgaussian then:${p_{s}(s)} = {\frac{1}{\sqrt{2\pi}\sigma_{0}}{\exp \left\lbrack {- \frac{\left( {s - s_{0}} \right)^{2}}{2\sigma_{o}^{2}}} \right\rbrack}}$

[0055] which gives $\begin{matrix}{{{p_{z}(z)} = {\frac{1}{\sqrt{2\pi}\sigma_{0}}{\frac{{- 2}K^{2}}{z^{3}}}{\exp \left\lbrack {\left( \frac{- 1}{2\sigma_{o}^{2}} \right)\left( {\frac{K^{2}}{z^{2}} - s_{o}} \right)^{2}} \right\rbrack}}},} \\{for} \\{\frac{K}{\sqrt{2s_{o}}} \leq z < \infty}\end{matrix}$

[0056] and has a maximum$z = {\frac{K}{\sqrt{s_{o}}} = {D{\sqrt{\frac{m}{2E_{o}s_{o}q}}.}}}$

[0057] By pushing forward a gaussian distribution for the spatialdistribution, a skewed gaussian for t_(D)(s) is obtained.

[0058]FIG. 5 shows a probability density function p_(z)(z) of ions withm/z=2000 and a gaussian spatial distribution N(s₀,σ_(o) ²) whereσ_(o)=s_(o). A clear skew to the right is shown.

[0059] Thus, is possible to calculate and/or at least analyticallyapproximate the probability density function of time-of-flight as afunction of random variables representing the initial position and/orenergy distributions. Some embodiments model simple analyzerconfigurations such as a single extraction region with a field and afield-free region. Other embodiments model more complicated analyzerconfigurations.

[0060] Model of Species Distributed in Energy

[0061] In some embodiments, the initial position is constant but theinitial kinetic energy in the flight axis-direction has a gaussiandistribution.

[0062] In one case, the initial distribution can be given by aN(U_(o),σ_(o) ²) random variable U. The time-of-flight in the driftregion is given by $\begin{matrix}{{{t_{D}(u)} = {{\psi (u)} = \frac{D\sqrt{2m}}{2\sqrt{U + K}}}},} \\{where} \\{K = {{qE}_{0}{s_{0}.}}} \\{Then} \\{{{\psi^{- 1}(t)} = {\frac{m\quad D^{2}}{2t^{2}} - K}},} \\{and} \\{{\frac{\quad}{t}{\psi^{- 1}(t)}} = {- {\frac{m\quad D^{2}}{t^{3}}.}}}\end{matrix}$

[0063] The probability distribution of the time-of-flight Z=ψ(U) is$\begin{matrix}{{p_{z}(z)} = {\frac{1}{\sqrt{2\pi}\sigma_{0}}\frac{m\quad D^{2}}{z^{3}}{{\exp \left( {{- \frac{1}{2\sigma_{0}^{2}}}\left\{ {\frac{m\quad D^{2}}{2\quad z^{2}} - K - U_{0}} \right\}^{2}} \right)}.}}} & (5)\end{matrix}$

[0064] Another Model of Species Distributed in Position

[0065] If y denotes the initial distance of an ion from the beginning ofthe field-free region (0≦y≦S), and$K = \frac{2q\quad e\quad E_{0}}{m}$

[0066] where

[0067] e is the charge of an electron in Coulombs

[0068] q is the integer charge of the ion

[0069] m is the mass of the ion

[0070] E_(o) is the electric field strength of the extraction region

[0071] then the time-of-flight is

t _(tof) =t _(ext) +t _(D)  (6)

[0072] where t_(tof) is the time-of-flight, t_(ext) is the time the ionspends in the extraction chamber, and t_(D) is the time the ion spendsin the field-free region. We can show that: $\begin{matrix}{t_{D} = \frac{D}{\sqrt{Ky}}} \\{and} \\{t_{ext} = {{\int_{0}^{y}\frac{s}{v(s)}}\quad = \frac{2\sqrt{y}}{\sqrt{K}}}}\end{matrix}$

[0073] Combining the above two terms gives t_(tof): $\begin{matrix}{t_{tof} = {\frac{1}{\sqrt{Ky}}\left( {{2y} + D} \right)}} & (7)\end{matrix}$

[0074] We suppose that the random variable Y, representing initialposition is distributed as

Y˜N(ν,τ ²).

[0075] If t_(tof)=F(y), then we need to find y F⁻¹(t). To this end,equation 7 can be rewritten as:

{square root}{square root over (Kyt)}=2y+D

[0076] Substituting z²=y, gives:

2z ² −{square root}{square root over (Ktz)}+D=0

4z=−{square root}{square root over (Kt)}±{square root}{square root over(Kt ²⁻8D)}

16Z ²=2Kt ²−8D∓2{square root}{square root over (Kt)}{square root}{squareroot over (Kt ²−8D)}

[0077] Substituting back in y $\begin{matrix}{y = \frac{{2K\quad t^{2}} - {{8D} \mp {2\sqrt{K}t\sqrt{{Kt}^{2} - {8D}}}}}{16}} & (8)\end{matrix}$

[0078] Of these two solutions, for physical reasons, the solution withthe minus sign can be chosen.

[0079] Let Φ(t)=F⁻¹ (t) and find the derivative with respect to t$\begin{matrix}\begin{matrix}{{4\frac{{\psi (t)}}{t}} = {{Kt} - \frac{{K^{2}t^{2}} - {4{DK}}}{\sqrt{{K^{2}t^{2}} - {8{KD}}}}}} \\{{4\frac{{\psi (t)}}{t}} = {{Kt} - \frac{{K^{2}t^{2}} - {4{DK}}}{\sqrt{{K^{2}t^{2}} - {8{KD}}}}}}\end{matrix} & (9)\end{matrix}$

[0080] From equations 8 and 9, the push forward can be calculated as$\begin{matrix}{{p_{T}(t)} = {\frac{{\psi^{\prime}(t)}}{\tau \sqrt{2\quad \pi}}{\exp \left( {- \frac{\left( {{\psi (t)} - v} \right)^{2}}{2\tau^{2}}} \right)}}} & (10)\end{matrix}$

[0081] Another Model of Species Distributed in Energy

[0082] The push forward for the case with an initial energy distributioncan be calculated. Suppose that the random variable X, representinginitial velocity, is distributed as $\begin{matrix}{X \sim {N\left( {\mu,\sigma^{2}} \right)}} \\{t_{D} = \frac{D}{\sqrt{x^{2} + {KS}}}} \\{t_{ext} = {\frac{2}{K}{\left( {\sqrt{x^{2} - {KS}} - x} \right).}}}\end{matrix}$

[0083] Combining these terms gives an expression for t_(tof):

[0084] (6) $\begin{matrix}{t_{tof} = {\frac{D}{\sqrt{x^{2} + {KS}}} + {\frac{2}{K}\left( {\sqrt{x^{2} + {KS}} - x} \right)}}} & (6)\end{matrix}$

[0085] Substituting u={square root}{square root over (x² +KS)}:${{2u} + \frac{KD}{u} - {2\sqrt{u^{2} - {KS}}} - {Kt}} = 0$

[0086] This can be written as a polynomial in u power 3.

4tu ³−(4s+4D+Kt ²)u ²−2KDtu+KD ²=0

[0087] Solving for u and letting A=4(D+S) gives: $\begin{matrix}{{\frac{1}{12t}\left( {A + {Kt}^{2} + \frac{A^{2} + {2\left( {A + {12D}} \right){Kt}^{2}} + {K^{2}t^{4}}}{{f(t)}^{1/3}} + {f(t)}^{1/3}} \right)},} \\{{f(t)} = {A^{3} + {3\left( {A^{2} + {12{AD}} - {72D^{2}}} \right){Kt}^{2}} + {3\left( {A + {12D}} \right)K^{2}t^{4}} + {K^{3}t^{6}} +}} \\{12\sqrt{3}\sqrt{D^{2}{{Kt}^{2}\left( {{- A^{3}} - {4\left( {A^{2} + {9{AD}} - {27D^{2}}} \right){Kt}^{2}} - {\left( {{5A} + {68D}} \right)K^{2}t^{4}} - {2K^{3}t^{6}}} \right)}}}\end{matrix}$

[0088] Now with Φ, Φ′(t) can also be calculated: $\begin{matrix}{{\psi^{\prime}(t)} = {\frac{1}{12}{t\left( {{2{Kt}} + \frac{{4\left( {A + {12D}} \right){Kt}} + {4K^{2}t^{3}}}{{f(t)}^{1/3}} +} \right.}}} \\\left. {\frac{1}{12}\left( {A + {f(t)}^{1/3} + {Kt}^{2} + \frac{A^{2} + {2\left( {A + {12D}} \right){Kt}^{2}} + {K^{2}t^{4}}}{{f(t)}^{1/3}}} \right)} \right)\end{matrix}$

[0089] Model of Combined Position and Energy

[0090] If ν is the velocity at the start of the field-free region, thenthe time-of-flight in the field-free region is given by $\begin{matrix}{t_{D} = \frac{D}{v}} \\{{and}\quad {the}\quad {inverse}\quad {by}} \\{{\psi (t)} = {- \frac{D}{t}}} \\{{with}\quad {derivative}} \\{{\psi^{\prime}(t)} = {- {\frac{D}{t^{2}}.}}}\end{matrix}$

[0091] If p_(V)(ν) is the distribution of velocities at the start of thefield-free region, then the corresponding time-of-flight distribution is${p_{T}(t)} = {\frac{D}{t^{2}}{p_{v}\left( \frac{D}{t} \right)}}$

[0092] General mass spectrometer analyzer configurations with anarbitrary number of electric field regions and field-free regions

[0093] Equations for calculating the time-of-flight of an ion throughany system involving uniform electric fields can be derived from thelaws of basic physics. Such equations can accurately determine theflight time as a function of the mass-to-charge ratio for any specificinstrument, with distances, voltages and initial conditions. Theaccuracy of such calculations can be limited by uncertainties in theprecise values of the input parameters and by the extent to which thesimplified one-dimensional model accurately represents the realthree-dimensional instrument. Other embodiments can use more thanone-dimension, such as a two-dimensional, or a three-dimensional model.

[0094] Analyzers with electric fields can have at least two kinds ofregions: field free regions, and constant field regions. Velocities ofan ion can be traced at different regions to understand thetime-of-flight. In an ideal field-free region of length L, an ion'sinitial and final velocities are the same and therefore the time spentin the region is

t_(Free) =L/ν _(final) =L/ν _(initial)

[0095] In other embodiments that have nonideal field-free regions withchanges in velocity in the field-free region, decelerations and/oraccelerations can be accounted for in the time spent in the field-freeregion.

[0096] In a simple constant electric field region, the velocity changesbut the acceleration is constant. Using this information, supposing theacceleration (that depends on mass) is a in a region of length L, thetime of flight is

t _(ConstantField)=νfinal/a−V _(initial) /a.

[0097] In other embodiments that have nonideal constant electric fieldregions with nonconstant acceleration, deviations from constantacceleration can be accounted for in the time spent in the constantfield region.

[0098] A general formula for total time-of-flight through regions withaccelerations a₁, . . . , a_(M) is given by$t = {\sum\limits_{k = 1}^{M}\quad t_{k}}$

[0099] where $t_{k} = \left\{ \begin{matrix}{{v_{k}/a_{k}} - {v_{k - 1}/a}} \\{L_{k}/v_{k - 1}}\end{matrix} \right.$

[0100] The connection between ν_(k-1) and ν_(k) is given by conservationof energy. ${v_{k}^{2} - v_{k - 1}^{2}} = \left\{ \begin{matrix}0 \\{2a_{k}{L_{k}.}}\end{matrix} \right.$

[0101] As a step towards simplification, note that${\frac{v_{k}}{a_{k}} - \frac{v_{k - 1}}{a_{k}}} = {{\frac{1}{a_{k}}\left( {v_{k} - v_{k - 1}} \right)}\quad = {{\frac{1}{a_{k}}\frac{v_{k}^{2} - v_{k - 1}^{2}}{v_{k} + v_{k - 1}}}\quad = {{\frac{1}{a_{k}}\frac{2a_{k}L_{k}}{v_{k} + v_{k - 1}}}\quad = {\frac{2L_{k}}{v_{k} + v_{k - 1}}.}}}}$

[0102] This leads to a unified formula for total time-of-flight:$t = {\sum\limits_{k = 1}^{M}\frac{2L_{k}}{v_{k} + v_{k - 1}}}$

[0103] Next, a simple inductive argument shows$v_{k}^{2} = {{\sum\limits_{j = 1}^{k}{2a_{j}L_{j}}} + {v_{0}^{2}.}}$

[0104] Letting ${P_{k} = {\sum\limits_{j = 1}^{k}{2a_{j}L_{j}}}},$

[0105] we rewrite the time-of-flight formula as $\begin{matrix}{t = {\sum\limits_{k = 1}^{M}{\frac{2L_{k}}{\sqrt{P_{k} + v_{0}^{2}} + \sqrt{P_{k - 1} + v_{0}^{2}}}.}}} & (6)\end{matrix}$

[0106] If we collect the initial conditions s_(o) and ν_(o) in one term

I(s _(o),ν_(o))=a ₁ s _(o)+ν_(o) ²,

[0107] then it is clear that we have nonnegative constants Q₁, . . . ,Q_(M) such that$t = {{\psi (I)} = {\sum\limits_{k = 1}^{M}{\frac{1}{\sqrt{Q_{k} + I} + \sqrt{Q_{k - 1} + I}}.}}}$

[0108] Taking a derivative shows that this is a strictly decreasingfunction for I>0 and therefore has an inverse. The derivative of theinverse of this function is of interest, according to (4) such a termaffects the pushforward density as a factor, and hence has a strongimpact on the shape of the push-forward distribution.

[0109] Next is introduced a procedure for calculating the inverse ψ⁻¹(t)of ψ(I). It can be observed that if

{square root}{square root over (x+a)}−{square root}{square root over(x)}=z

[0110] then $x = {\left( \frac{a - z^{2}}{2z} \right)^{2}.}$

[0111] If any of the t₁, . . . t_(M), is known, then it would be easy tocalculate I. In one approach, these t_(k) can be backed out of in stagesuntil t is exhausted. The system of quadratic equations includes thefollowing: for each I≦k≦M:${{\left( \frac{{a_{k}L_{k}} - t_{k}^{2}}{2t_{k}} \right)^{2} - Q_{k}} = I},$

[0112] with the constraint that the t_(k) sum to t.

[0113] Linshapes of a Single-Stage Reflectron Mass Spectrometer

[0114] Some embodiments can be applied to a mass spectrometer includingthree chambers and a detector—a ion extraction chamber (e.g.rectangular), a field-free drift tube, and a reflectron. The shape ofthe distribution of the time-of-flight of a single mass-to-chargespecies can be determined at least partly by the distributions ofinitial positions in the extraction chamber and/or the initialvelocities along the flight-axis.

[0115] Approximate formulae can be derived for the time-of-flightdistribution for a species of fixed mass-to-charge ratio, in thisexample assuming that the distributions for initial positions andvelocities are gaussian. The initial positions have restricted range,and the assumption for initial position may be modified to reflect this.

[0116] The plane that separates the extraction region from thefield-free drift region can be called the “drift start” plane. For agiven ion the flight-axis velocity at the “drift start” plane can bereferred to as the “drift start velocity.”

[0117] Basic Formulae

[0118] If x denotes the initial velocity and y denotes the initialdistance of an ion from the drift-start plane (0≦y≦S), and$K = \frac{2{qeE}_{0}}{m}$

[0119] where

[0120] e is the charge of an electron in Coulombs

[0121] q is the integer charge of the ion

[0122] m is the mass of the ion

[0123] E_(o) is the electric field strength of the extraction regionthen

ν(x,y)={square root}{square root over (x²+Ky)}.

[0124] If an ion has drift-start velocity of ν and if

[0125] L₁ is the length of the drift region

[0126] L₂ is the distance from the drift-end plane and the detector

[0127] D=L₁+L₂

[0128] E₁ is the electric field strength of the reflectron, and

[0129] a=qeE₁/m is the acceleration of the ion in the reflectron

[0130] then the time-of-flight of the ion is${T(v)} = {\frac{D}{v} + {2{\frac{v}{a}.}}}$

[0131] Given a distribution P_(xy) in the (x, y)-space of initialvelocities and positions, the probability density can be determined thatresults when this distribution is pushed forward by

(x,y)→ν(x,y).

[0132] The resulting density in the space of velocities can be denotedby P_(V). Next, T can be used to push forward the density P_(V) to a newdensity in the t-space

p _(T) =T·p _(V).

[0133] Expression for P_(V) in the Gaussian Case

[0134] Suppose that the random variable X, representing initialvelocity, and Y, representing initial position, are distributed as

X˜N(μ,σ²)

Y˜N(ν,τ²)

[0135] The push-forward of p, under

ν(x,y)={square root}{square root over (x²+Ky)}

[0136] can be given by integrating the measure p_(xy) (x, y)dxdy overthe fibers

Fiber(ν)={(x, y):{square root}{square root over (x² +Ky)}=ν}.

[0137] Suppose F(x, y) is any function of x and y. ThenE_(XY)[F] = ∫_(x)  ∫_(y)  F(x, y)p_(XY)(x, y)xy.

[0138] Change the variables to z={square root}{square root over (Ky)}.Then${dz} = {{\frac{\sqrt{K}}{2\sqrt{y}}{dy}} = {{\frac{K}{2\sqrt{Ky}}{dy}} = {\frac{K}{2z}{{dy}.}}}}$

[0139] Therefore, ${\frac{2z}{K}{dz}} = {{dy}.}$

[0140] So${E_{XY}\lbrack F\rbrack} = {\int_{x}^{\quad}{\int_{z = 0}^{z = \sqrt{KS}}{{F\left( {x,\frac{z^{2}}{K}} \right)}{p_{XY}\left( {x,\frac{z^{2}}{K}} \right)}\frac{2}{K}z{z}{{x}.}}}}$

[0141] Now change to polar coordinates (ν,θ). Care can be taken with theranges of θ: when ν≦{square root}{square root over (KS)} the range of θis [−π/2,π/2]; however, when ν>{square root}{square root over (KS)} therange can be broken into two symmetric parts that consist of[arccos({square root}{square root over (KS)}/ν), π/2] and its mirrorimage. Refer to FIG. 6.

[0142] Next, change to polar coordinates z=ν cos θ and x=ν sin θ withoutspecifying the limits of θ to get $\begin{matrix}{\quad{{E_{XY}\lbrack F\rbrack} = {\int_{v}^{\quad}{\int_{\theta}^{\quad}{{F\left( {{v\quad \sin \quad \theta},{\frac{v^{2}}{K}\cos^{2}\theta}} \right)}{p_{XY}\left( {{v\quad \sin \quad \theta},{\frac{v^{2}}{K}\cos^{2}\theta}} \right)}\frac{2v}{k}\cos \quad \theta \quad v{\theta}{v}}}}}} \\{{= {\int_{v}^{\quad}{\frac{2v^{2}}{K}\left( {\int_{\theta}^{\quad}{{F\left( {{v\quad \sin \quad \theta},{\frac{v^{2}}{K}\cos^{2}\quad \theta}} \right)}{p_{XY}\left( {{v\quad \sin \quad \theta},{\frac{v^{2}}{K}\cos^{2}\quad \theta}} \right)}\cos \quad \theta {\theta}}} \right){v}}}}\quad}\end{matrix}$

[0143] Make the change of variables u=ν sin θ so that the inner integralabove becomes$\frac{2}{K}{\int_{0}^{v}{{F\left( {u,\frac{\left( {v^{2} - u^{2}} \right)}{K}} \right)}{p_{XY}\left( {u,\frac{\left( {v^{2} - u^{2}} \right)}{K}} \right)}{u}}}$

[0144] An expression for p_(V) for ν≦{square root}{square root over(KS)} can be given by${{p_{v}(v)} = {\frac{4v}{K}{\int_{0}^{v}{{p_{XY}\left( {u,\frac{v^{2} - u^{2}}{K}} \right)}{u}}}}};$

[0145] and for ν≧{square root}{square root over (KS)}, the range of θ is[arccos({square root}{square root over (KS)}/ν),π/2] and change ofvariables to u yields the range [{square root}{square root over(ν²−KS)}, ν] as clear from FIG. 6:${p_{v}(v)} = {\frac{4v}{K}{\int_{\sqrt{v^{2} - {KS}}}^{v}{{p_{XY}\left( {u,\frac{v^{2} - u^{2}}{K}} \right)}{{u}.}}}}$

[0146] Upper and lower bounds can be explored that lead to anapproximation that has accurate decay as ν−∞.

[0147] Approximation of Taylor expansion${p_{v}(v)} = \left\{ \begin{matrix}{\frac{4v}{2\quad \pi \quad \sigma \quad K\quad \tau}{\int_{0}^{v}{{\left( {u,v} \right)}{u}}}} & {v \leq \sqrt{Ks}} \\{\frac{4v}{2\quad \pi \quad \sigma \quad K\quad \tau}{\int_{\sqrt{v^{2} - {Ks}}}^{v}{{\left( {u,v} \right)}{u}}}} & {\sqrt{Ks} \leq v < \infty}\end{matrix} \right.$

[0148] where $\begin{matrix}{{e\left( {u,v} \right)} = {\exp \left\{ {{- \frac{u^{2}}{2\sigma^{2}}} - {\frac{1}{2\tau^{2}}\left( {\frac{v^{2} - u^{2}}{K} - v} \right)^{2}}} \right\}}} \\{= {\exp \left\{ {{- \frac{u^{2}}{2\sigma^{2}}} - {\frac{1}{2\tau^{2}K^{2}}\left( {v^{2} - u^{2} - {Kv}} \right)^{2}}} \right\}}} \\{= {\exp \left\{ {{- \frac{u^{2}}{2\sigma^{2}}} - {\frac{1}{2\tau^{2}K^{2}}\left( {u^{2} - v^{2} + {Kv}} \right)^{2}}} \right\}}} \\{= {\exp \left\{ {- {\frac{1}{2\tau^{2}K^{2}}\left\lbrack {{u^{2}\frac{\tau^{2}K^{2}}{\sigma^{2}}} + \left( {u^{2} - v^{2} + {Kv}} \right)^{2}} \right\rbrack}} \right\}}} \\{= {\exp \left\{ {- {\frac{1}{2\tau^{2}K^{2}}\left\lbrack {\left( {{v^{2}\frac{\tau^{2}K^{2}}{\sigma^{2}}} - {{Kv}\frac{\tau^{2}K^{2}}{\sigma^{2}}}} \right) + \quad \left( {{\frac{\tau^{2}K^{2}}{\sigma^{2}}\left( {u^{2} - v^{2} + {Kv}} \right)} + \left( {u^{2} - v^{2} + {Kv}} \right)^{2}} \right)} \right\rbrack}} \right\}}} \\{= {\exp \left\{ {{- \frac{v^{2}}{2\sigma^{2}}} + \frac{Kv}{2\sigma^{2}} + \frac{\tau^{2}K^{2}}{8\sigma^{4}}} \right\} \exp \left\{ {{- \frac{1}{2}}\left( {\frac{u^{2}}{\tau \quad K} - \frac{v^{2}}{\tau \quad K} + \frac{\tau \quad K}{2\sigma^{2}} + \frac{v}{\tau}} \right)^{2}} \right\}}}\end{matrix}$

[0149] Let$\alpha = {\frac{v^{2}}{\tau \quad K} - \frac{\tau \quad K}{2\sigma^{2}} - \frac{v}{\tau}}$

[0150] and${A(v)} = {\exp \left( {{- \frac{v^{2}}{2\sigma^{2}}} + \frac{Kv}{2\sigma^{2}} + \frac{\tau^{2}K^{2}}{8\sigma^{4}}} \right)}$${p_{v}(v)} = \left\{ \begin{matrix}{\frac{4v}{2{\pi\sigma}\quad K\quad \tau}{A(v)}{\int_{0}^{v}{\exp \left\{ {{- \frac{1}{2}}\left( {\frac{u^{2}}{\tau \quad K} - \alpha} \right)^{2}} \right\} {u}}}} & {v \leq \sqrt{Ks}} \\{\frac{4v}{2{\pi\sigma}\quad K\quad \tau}{A(v)}{\int_{\sqrt{v^{2} - {Ks}}}^{v}{\exp \left\{ {{- \frac{1}{2}}\left( {\frac{u^{2}}{\tau \quad K} - \alpha} \right)^{2}} \right\} {u}}}} & {\sqrt{Ks} \leq v < \infty}\end{matrix} \right.$

[0151] This last integral can be simplified using Taylor expansion. Inthis example, a five term expansion is used. Let${G(x)} = {x{\int_{0}^{x}{{\exp \left( {{- \frac{1}{2}}\left( {u^{2} - x^{2} - a} \right)^{2}} \right)}{u}}}}$

[0152] Then${x\quad {G(x)}} = {{^{{- \frac{1}{2}}a^{2}}\left( {x^{2} - {\frac{2}{3}{ax}^{3}} + {\frac{{16a^{4}} + {32a^{2}} - 32}{120}x^{6}}} \right)}.}$

[0153] Note that${{A(v)}^{{- \frac{1}{2}}a^{2}}} = {{\exp \left( {{- \frac{v^{2}}{2\sigma^{2}}} - \frac{v^{2}}{2\tau^{2}}} \right)}.}$

[0154] Fitting Modeled Lineshapes to Empirically Observed Data

[0155] The mathematical forms derived above for the lineshapes, orshapes of peaks, of the different species based upon the underlyingphysics of the mass spectrometer, can be applied to the analysis ofspectra. Rigorous fits can be performed between empirical mass spectraand synthetic mass spectra generated from mixtures of lineshapes.

[0156] A more complex method for fitting a mass spectrum using modeledlineshape equations uses model basis vectors, such as wavelets and/orvaguelettes. This can be done generally, and/or for a given massspectrometer design. A basis set is a set of vectors (or sub-spectra),the combination of which can be used to model an observed spectrum. Anexpansion of the lineshape equations can derive a basis set that is veryspecific for a given mass spectrometer design.

[0157] A spectrum can be described using the basis vectors. An observedempirical spectrum can be described by a weighted sum of basis vectors,where each basis vector is weighted by multiplication by a coefficient.

[0158] Some embodiments use scaling. The linewidth of the peakcorresponding to a species in a mass spectrum is dependent on thetime-of-flight of the species. Thus, the linewidth in a mass spectrummay not be constant for all species. One way to address this is torescale the spectrum such that the linewidths in the scaled spectrum areconstant. Such a method can utilize the linewidth as a function oftime-of-flight. This can be determined and/or be estimated analytically,empirically, and/or by simulation. Spectra with constant linewidth canbe suitable for many signal processing techniques which may not apply tonon-constant linewidth spectra.

[0159] Some embodiments use linear combinations and/or matchedfiltering. In one embodiment, a weighted sum of lineshape functionsrepresenting peaks of different species can be fitted to the observedsignal by minimizing error. The post-processed data can include theresulting vector of weights, which can represent the abundance ofspecies in the observed mass spectrum.

[0160] Fitting can assume that the spectrum has a fixed set of lineshapecenters (including mass-to-charge values) C₁, C₂, . . . , C_(N) and apredetermined set of widths for each center σ₁,σ₂, . . . , σ_(N). Alineshape function such as λ(c, σ, t) may be determined for eachcenter-width pair. A synthetic spectrum may include a weighted sum ofsuch lineshape functions:${S(t)} = {\sum\limits_{i \leq i \leq N}{w_{i}{{\lambda \left( {c_{i},\sigma_{i},t} \right)}.}}}$

[0161] A minimal error fit can be performed to calculate the parametersW₁, . . . , W_(N). The error function could be the squared error, or apenalized squared error.

[0162] One advantage of this method is that it reduces the number ofdata dimensions, since an observed spectrum with a large number of datapoints can be described by a few parameters. For example, if an observedspectrum has 20,000 data points, and 20 peaks, then the spectrum can bedescribed by 60 points consisting of 20 triplets of center, width, andamplitude. The original 20,000 dimensions have been reduced to 60dimensions.

[0163] Some embodiments construct convolution operators. Lineshapesconstructed analytically, determined empirically, and/or determined bysimulation may be used to approximate a convolution operator thatreplaces a delta peak (e.g., an ideal peak corresponding to thetime-of-flight for a particular species) with the correspondinglineshape.

[0164] Some embodiments use Fourier transform deconvolution. The Fouriertransform and/or numerical fast Fourier transform of a spectrum such asthe rescaled spectrum can be multiplied by a suitable function of theFourier transform of the lineshape determined analytically, estimatedempirically, and/or by simulation. The inverse Fourier transform orinverse fast Fourier transform can be applied to the resulting signal torecover a deconvolved spectrum.

[0165] Some embodiments use scaling and wavelet filtering. Any family ofwavelet bases can be chosen, and used to transform a spectrum, such as arescaled spectrum. A constant linewidth of the spectrum can be used tochoose the level of decomposition for approximation and/or thresholding.The wavelet coefficients can be used to describe the spectrum withreduced dimensions and reduced noise.

[0166] Some embodiments use blocking and wavelet filtering. The spectrumcan be divided into blocks whose sizes can be determined by linewidthsdetermined analytically, estimated empirically, and/or by simulation.Any family of wavelet bases can be chosen and used to transform aspectrum, such as the raw spectrum. Different width features can bedescribed in the wavelet coefficients at different levels. The waveletcoefficients from the appropriate decomposition levels can be used todescribe the spectrum with reduced dimensions and reduced noise.

[0167] Some embodiments construct new wavelet bases. Analyticallineshapes, empirically determined lineshapes, and/or simulatedlineshapes for a given configuration of a mass spectrometer can be usedto construct families of wavelets. These wavelets can then be used forfiltering.

[0168] Vaguelettes are another choice for basis sets. The vaguelettesvectors can include vaguelettes derived from wavelet vectors,vaguelettes derived from modeled lineshapes, and/or vaguelettes derivedfrom empirical lineshapes.

[0169] Some embodiments use wavelet-vaguelette decomposition. Anothermethod based on wavelet filtering may be the wavelet-vaguelettedecomposition. The modeled lineshape functions may be used to constructa convolution operator that replaces a delta peak with the correspondinglineshape. Any family of wavelet bases may be chosen, such as ‘db4’,‘symmlet’, ‘coiflet’. The convolution operator may be applied to thewavelet bases to construct a set of vaguelettes. A minimal error fit maybe performed for the coefficients of the vaguelettes to the observedspectrum. The resulting coefficients may be used with the correspondingwavelet vectors to produce a deconvolved spectrum that representsabundances of species in the observed spectrum.

[0170] Some embodiments use thresholding estimators. Another method fordeconvolving a rescaled spectrum is the use of the mirror wavelet bases.If the observed spectrum is y=Gx+e, and if H is the pseudo-inverse of G,and if z=He, then let K be the covariance of z. The Kalifa-Mallat mirrorwavelet basis can guarantee that K is almost diagonal in that basis. Thedecomposition coefficients in this basis can be performed with, awavelet packet filter bank requiring O(N) operations. These coefficientscan be soft-thresholded with almost optimal denoising properties for thereconstructed synthetic spectra.

[0171] Fitting a basis set to an observed empirical spectrum does notnecessarily reduce the dimensionality, or the number of data pointsneeded to describe a spectrum. However, fitting the basis set “changesthe basis” and does yield coefficients (parameters) that can be filteredmore easily. If many of the coefficients of the basis vectors are closeto zero, then the new representation is sparse, and only some of the newbasis vectors contain most of the information.

[0172] In another example of filtering noise and reducingdimensionality, thresholding can be performed on the basis vectorcoefficients. These methods remove or deemphasize the lowest amplitudecoefficients, leaving intensity values for only the true signals. Hardthresholding sets a minimum cutoff value, and throws out any peaks whoseheight is under that threshold; smaller peaks may be considered to benoise. Soft thresholding can scale the numbers and then threshold.Multiple thresholds and/or scales can be used.

[0173]FIGS. 7 and 8 are empirical figures that show that real massspectra have lineshapes with a skewed shape consistent with the resultsof the pushed-forward lineshapes.

[0174]FIG. 7 illustrates a mass spectrum of a 3 peptide mixture ofangiotensin (A), bradykinin (B), and neurotensin (N). Data werecollected on an electro-spray-ionization time-of-flight massspectrometer (ESI-TOF MS). For each peptide, there are two peaks, onefor the +2 and +3 charge states. For example, A(+2) is the angiotensin+2 charge state.

[0175]FIG. 8 illustrates an expanded view of FIG. 7 to display in detailthe bradykinin +2 charge state. The various peaks present are due todifferent isotope compositions of the bradykinin ions in the ensemble(e.g. 13 C vs. 12 C) By visual inspection, one can observe that thepeakshapes are skewed to the right.

[0176] Conversion between time-of-flight and mass to charge is trivial.For example, in some cases mass-to-charge(m/z)=2*(extraction_voltage/flight_distance²) * time-of-flight². Thus, atime-of-flight distribution can be considered an example of amass-to-charge distribution.

[0177] Some embodiments can run on a computer cluster. Networkedcomputers that perform CPU-intensive tasks in parallel can run many jobsin parallel. Daemons running on the computer nodes can accept jobs andnotify a server node of each node's progress. A daemon running on theserver node can accept results from the computer nodes and keep track ofthe results. A job control program can run on the server node to allow auser to submit jobs, check on their progress, and collect results. Byrunning computer jobs that operate independently, and distributingnecessary information to the computer nodes as a pre-computation, almostlinear speed is gained in computation time as a function of the numberof compute nodes used.

[0178] Other embodiments run on individual computers, supercomputersand/or networked computers that cooperate to a lesser or greater degree.The cluster can be loosely parallel, more like a simple network ofindividual computers, or tightly parallel, where each computer can bededicated to the cluster.

[0179] Some embodiments can be implemented on a computer cluster or asupercomputer. A computer cluster or a supercomputer can allow quick andexhaustive sweeps of parameter spaces to determine optimal signatures ofdiseases such as cancer, and/or discover patterns in cancer.

What is claimed is:
 1. A method of modeling mass spectra, comprising:representing, with a modeled mass-to-charge distribution detected by amass spectrometer, at least a first plurality of molecules of at least afirst molecule type, wherein the modeled mass-to-charge distribution isbased on at least a modeled initial distribution of one or moreparameters descriptive of the first plurality of molecules, the modeledinitial distribution representing at least the first plurality ofmolecules prior to traveling in the mass spectrometer, the one or moreparameters affecting time-of-flight of the first plurality of moleculestraveling in the mass spectrometer, wherein the modeled initialdistribution represents at least the first plurality of molecules ashaving a plurality of values for at least one parameter of the one ormore parameters; and comparing the modeled mass-to-charge distribution,and an empirical mass-to-charge distribution of at least the firstplurality of molecules of at least the first molecule type.
 2. Themethod of claim 1, wherein the one or more parameters includes at leastone of: position, energy, ionization, position focusing, extractionsource shape, fringe effects of electric fields, and electronic hardwareartifacts.
 3. The method of claim 1, wherein the one or more parametersincludes at least position and energy.
 4. The method of claim 3, whereinthe modeled mass-to-charge distribution is further based on at leastmodeling of the first plurality of molecules traveling at least one ormore electric field-free regions of the mass spectrometer.
 5. The methodof claim 4, wherein the modeled mass-to-charge distribution is furtherbased on at least modeling of the first plurality of molecules travelingat least one or more electric field regions of the mass spectrometer. 6.The method of claim 1, wherein the modeled initial distribution ispushed forward by one or more equations representing one or more time offlight functions to at least partly yield the modeled mass-to-chargedistribution, such that the modeled mass-to-charge distribution is basedon at least the modeled initial distribution representing at least thefirst plurality of molecules prior to traveling in the massspectrometer.
 7. The method of claim 1, wherein the plurality ofpositions of the first plurality of molecules is represented at least bya Gaussian distribution.
 8. The method of claim 7, wherein the modeledinitial distribution representing the plurality of positions of thefirst plurality of molecules at least by the Gaussian distribution ispushed forward by one or more equations representing one or more time offlight functions to at least partly yield the modeled mass-to-chargedistribution.
 9. The method of claim 1, wherein the plurality ofpositions of the first plurality of molecules is represented by one ormore equations based on at least statistical mechanics of ion gases. 10.The method of claim 9, wherein the modeled initial distributionrepresenting the plurality of positions of the first plurality ofmolecules at least by the one or more equations based on at least thestatistical mechanics of ion gases is pushed forward by one or moreequations representing one or more time of flight functions to at leastpartly yield the modeled mass-to-charge distribution.
 11. The method ofclaim 1, wherein the plurality of energies of the first plurality ofmolecules is represented at least by a Gaussian distribution.
 12. Themethod of claim 11, wherein the modeled initial distributionrepresenting the plurality of energies of the first plurality ofmolecules at least by the Gaussian distribution is pushed forward by oneor more equations representing one or more time of flight functions toat least partly yield the modeled mass-to-charge distribution.
 13. Themethod of claim 1, wherein the plurality of energies of the firstplurality of molecules is represented by one or more equations based onat least the statistical mechanics of ion gases.
 14. The method of claim13, wherein the modeled initial distribution representing the pluralityof energies of the first plurality of molecules at least by the one ormore equations based on at least the statistical mechanics of ion gasesis pushed forward by one or more equations representing one or more timeof flight functions to at least partly yield the modeled mass-to-chargedistribution.
 15. The method of claim 1, further comprising: detecting,with the mass spectrometer, the empirical mass-to-charge distribution ofat least the first plurality of molecules of at least the first moleculetype.
 16. The method of claim 1, further comprising: performing a fitbetween the empirical mass-to-charge distribution and the modeledmass-to-charge distribution.
 17. The method of claim 16, wherein the fitincludes at least a least squares fit.
 18. The method of claim 16,wherein the fit includes at least a penalized least squares fit.
 19. Themethod of claim 16, further comprising: filtering the fit.
 20. Themethod of claim 19, wherein filtering the fit includes hardthresholding.
 21. The method of claim 19, wherein filtering the fitincludes soft thresholding.
 22. The method of claim 19, whereinfiltering the fit includes filtering with a filter bank.
 23. The methodof claim 19, wherein filtering uses at least one of wavelet basisvectors and vaguelette basis vectors.
 24. The method of claim 16,wherein performing the fit includes: deriving a plurality of model basisvectors from at least the modeled mass-to-charge distribution; andrepresenting the empirical mass-to-charge distribution with a weightedsum of the plurality of model basis vectors.
 25. The method of claim 24,wherein the plurality of model basis vectors includes wavelet vectors.26. The method of claim 25, wherein the wavelet vectors include standardwavelet vectors.
 27. The method of claim 25, wherein the wavelet vectorsinclude wavelet vectors derived at least from one or more lineshapes ofthe modeled mass-to-charge distribution.
 28. The method of claim 25,wherein the wavelet vectors include wavelet vectors derived at leastfrom one or more lineshapes of the empirical mass-to-chargedistribution.
 29. The method of claim 24, wherein the plurality of modelbasis vectors includes vaguelette vectors.
 30. The method of claim 29,wherein the vaguelette vectors are derived at least from one or morewavelet vectors.
 31. The method of claim 29, wherein the vaguelettevectors include vaguelette vectors derived at least from one or morelineshapes of the modeled mass-to-charge distribution.
 32. The method ofclaim 29, wherein the vaguelette vectors include vaguelette vectorsderived at least from one or more lineshapes of the empiricalmass-to-charge distribution.
 33. The method of claim 24, furthercomprising: filtering the weighted sum of the plurality of model basisvectors.
 34. The method of claim 33, wherein filtering the plurality ofmodel basis vectors includes hard thresholding.
 35. The method of claim33, wherein filtering the plurality of model basis vectors includes softthresholding.
 36. The method of claim 1, such that the modeledmass-to-charge distribution shows noise reduction compared to theempirical mass-to-charge distribution.
 37. The method of claim 1, suchthat the modeled mass-to-charge distribution shows data compressioncompared to the empirical mass-to-charge distribution.
 38. The method ofclaim 1, such that the modeled mass-to-charge distribution shows datarecovery compared to the empirical mass-to-charge distribution.
 39. Themethod of claim 1, such that the modeled mass-to-charge distributionshows dimensionality reduction compared to the empirical mass-to-chargedistribution.
 40. The method of claim 1, such that the modeledmass-to-charge distribution is used for pattern recognition.
 41. Themethod of claim 1, further comprising: finding one or more proteinsindicative of one or more diseases based at least partly on the modeledmass-to-charge distribution.
 42. The method of claim 41, furthercomprising: diagnosing, based at least partly on the one or moreproteins, at least one person with the one or more diseases.
 43. Themethod of claim 1, further comprising: diagnosing at least one personwith one or more diseases based at least partly on the modeledmass-to-charge distribution.
 44. A method of modeling mass spectra,comprising: representing, with a modeled mass-to-charge distributiondetected by a mass spectrometer, at least a first plurality of moleculesof at least a first molecule type, wherein the modeled mass-to-chargedistribution is based on at least a modeled initial distribution of oneor more parameters descriptive of the first plurality of molecules, themodeled initial distribution representing at least the first pluralityof molecules prior to traveling in the mass spectrometer, the one ormore parameters affecting time-of-flight of the first plurality ofmolecules traveling in the mass spectrometer, wherein the modeledmass-to-charge distribution is derived at least partly from a pushforward probability density transformation of the modeled initialdistribution by one or more functions based at least partly on aconfiguration of the mass spectrometer, wherein the modeled initialdistribution represents at least the first plurality of molecules ashaving a plurality of values for at least one parameter of the one ormore parameters.
 45. The method of claim 44, wherein the modeled initialdistribution represents at least the first plurality of molecules ashaving the plurality of positions and the plurality of energies.
 46. Themethod of claim 45, wherein the modeled mass-to-charge distribution isfurther based on at least modeling of the first plurality of moleculestraveling at least one or more electric field-free regions of the massspectrometer.
 47. The method of claim 46, wherein the modeledmass-to-charge distribution is further based on at least modeling of thefirst plurality of molecules traveling at least one or more electricfield regions of the mass spectrometer.
 48. The method of claim 44,wherein the plurality of positions of the first plurality of moleculesis represented at least by a Gaussian distribution.
 49. The method ofclaim 44, wherein the modeled initial distribution representing theplurality of positions of the first plurality of molecules at least bythe Gaussian distribution is pushed forward by one or more equationsrepresenting one or more time of flight functions to at least partlyyield the modeled mass-to-charge distribution.
 50. The method of claim44, wherein the plurality of positions of the first plurality ofmolecules is represented by one or more equations based on at leaststatistical mechanics of ion gases.
 51. The method of claim 50, whereinthe modeled initial distribution representing the plurality of positionsof the first plurality of molecules at least by the one or moreequations based on at least the statistical mechanics of ion gases ispushed forward by one or more equations representing one or more time offlight functions to at least partly yield the modeled mass-to-chargedistribution.
 52. The method of claim 44, wherein the plurality ofenergies of the first plurality of molecules is represented at least bya Gaussian distribution.
 53. The method of claim 52, wherein the modeledinitial distribution representing the plurality of energies of the firstplurality of molecules at least by the Gaussian distribution is pushedforward by one or more equations representing one or more time of flightfunctions to at least partly yield the modeled mass-to-chargedistribution.
 54. The method of claim 44, wherein the plurality ofenergies of the first plurality of molecules is represented by one ormore equations based on at least the statistical mechanics of ion gases.55. The method of claim 54, wherein the modeled initial distributionrepresenting the plurality of energies of the first plurality ofmolecules at least by the one or more equations based on at least thestatistical mechanics of ion gases is pushed forward by one or moreequations representing one or more time of flight functions to at leastpartly yield the modeled mass-to-charge distribution.
 56. The method ofclaim 44, further comprising: comparing the modeled mass-to-chargedistribution and an empirical mass-to-charge distribution of at leastthe first plurality of molecules of at least the first molecule type;and performing a fit between the empirical mass-to-charge distributionand the modeled mass-to-charge distribution.
 57. The method of claim 56,wherein the fit includes at least a least squares fit.
 58. The method ofclaim 56, wherein the fit includes at least a penalized least squaresfit.
 59. The method of claim 56, further comprising: filtering the fit.60. The method of claim 59, wherein filtering the fit includes hardthresholding.
 61. The method of claim 59, wherein filtering the fitincludes filtering with a filter bank.
 62. The method of claim 59,wherein the filtering uses at last one of wavelet basis vectors andvaguelette basis vectors.
 63. The method of claim 59, wherein filteringthe fit includes soft thresholding.
 64. The method of claim 56, whereinperforming the fit includes: deriving a plurality of model basis vectorsfrom at least the modeled mass-to-charge distribution; and representingthe empirical mass-to-charge distribution with a weighted sum of theplurality of model basis vectors.
 65. The method of claim 64, whereinthe plurality of model basis vectors includes wavelet vectors.
 66. Themethod of claim 65, wherein the wavelet vectors include standard waveletvectors.
 67. The method of claim 65, wherein the wavelet vectors includewavelet vectors derived at least from one or more lineshapes of themodeled mass-to-charge distribution.
 68. The method of claim 65, whereinthe wavelet vectors include wavelet vectors derived at least from one ormore lineshapes of the empirical mass-to-charge distribution.
 69. Themethod of claim 64, wherein the plurality of model basis vectorsincludes vaguelette vectors.
 70. The method of claim 69, wherein thevaguelette vectors are derived at least from one or more waveletvectors.
 71. The method of claim 69, wherein the vaguelette vectorsinclude vaguelette vectors derived at least from one or more lineshapesof the modeled mass-to-charge distribution.
 72. The method of claim 69,wherein the vaguelette vectors include vaguelette vectors derived atleast from one or more lineshapes of the empirical mass-to-chargedistribution.
 73. The method of claim 64, further comprising: filteringthe weighted sum of the plurality of model basis vectors.
 74. The methodof claim 73, wherein filtering the plurality of model basis vectorsincludes hard thresholding.
 75. The method of claim 73, whereinfiltering the plurality of model basis vectors includes softthresholding.
 76. The method of claim 44, such that the modeledmass-to-charge distribution shows noise reduction when compared to theempirical mass-to-charge distribution.
 77. The method of claim 44, suchthat the modeled mass-to-charge distribution shows data compression whencompared to the empirical mass-to-charge distribution.
 78. The method ofclaim 44, such that the modeled mass-to-charge distribution shows datarecovery when compared to the empirical mass-to-charge distribution. 79.The method of claim 44, such that the modeled mass-to-chargedistribution shows dimensionality reduction when compared to theempirical mass-to-charge distribution.
 80. The method of claim 44, suchthat the modeled mass-to-charge distribution is used for patternrecognition.
 81. The method of claim 44, further comprising: finding oneor more proteins indicative of one or more diseases based at leastpartly on the modeled mass-to-charge distribution.
 82. The method ofclaim 81, further comprising: diagnosing, from the one or more proteins,at least one person with the one or more diseases.
 83. The method ofclaim 44, further comprising: diagnosing at least one person with one ormore diseases based at least partly on the modeled mass-to-chargedistribution.
 84. A method of processing mass spectra, comprising:accessing a mass spectrum; and decorrelating at least two overlappingpeaks of the mass spectrum.
 85. The method of claim 84, wherein at leastpart of the mass spectrum is simulated.
 86. The method of claim 84,wherein at least part of the mass spectrum is empirical.
 87. The methodof claim 84, wherein at least part of the mass spectrum is derived atleast partly from a push forward probability density transformation of amodeled initial distribution by one or more functions based at leastpartly on a configuration of a mass spectrometer.
 88. The method ofclaim 84, wherein the mass spectrum was taken from at least onebiological sample.